fix: guard handle_read against closed socket race (#614) by fruch · Pull Request #902 · scylladb/python-driver

fruch · 2026-06-11T14:50:43Z

Minimal fix for issue #614 — Bad file descriptor

2-line change. Adds if self.is_closed: return at the top of LibevConnection.handle_read().

Root Cause

Thread race between close() and handle_read():

close() calls self._socket.close() synchronously (from any thread)
Watcher stop is asynchronous (batched in _loop_will_run(), next loop iteration)
If the reactor has already dispatched handle_read() for that connection before the watcher is stopped, recv() hits a closed FD → EBADF

Thread A (app):    close() → _socket.close()     [immediate]
Thread B (reactor): handle_read() → recv()       [EBADF - socket already closed]
                    ...
                    _loop_will_run() → watcher.stop()  [too late]

The Fix

 def handle_read(self, watcher, revents, errno=None):
+    if self.is_closed:
+        return
     if revents & libev.EV_ERROR:

Why this is safe

is_closed is set under self.lock in close() before _socket.close() is called
Reading a boolean attribute is atomic in CPython (GIL guarantees)
If is_closed is True, the socket is either already closed or about to be — no useful work can be done

Reproducer

100% reproduction rate: https://github.com/scylladb/scylla-dtest/pull/7079

Note on existing fix branches

fix/factory-close-race-614 — fixes factory returning dead connections (different race)
fix/asyncio-close-race-614 — fixes AsyncioConnection race (different reactor)

Neither adds the handle_read guard for LibevConnection. This PR is complementary.

How to test with SCT/Hydra

# In SCT requirements, point python-driver to this branch:
pip install git+https://github.com/scylladb/python-driver.git@fruch/fix-614-handle-read-guard

# Or in hydra, override the driver version in the test config

@vponomaryov — could you try building hydra with this branch and check if it fixes the EBADF reports in longevity runs? This is the minimal possible fix (2 lines). If it works, we can merge this independently of the other #614 fix branches.

Closes #614

Add is_closed check at the top of LibevConnection.handle_read() to prevent recv() on an already-closed socket. Root cause: close() is called from any thread (main/app) and does _socket.close() synchronously, but the libev reactor thread may already be inside handle_read() for that connection (dispatched before _loop_will_run batches the watcher stop). Without this guard, recv() on the closed FD raises OSError [Errno 9] Bad file descriptor. This is the minimal fix for issue scylladb#614. The race is inherent in the threading model (single reactor thread shared across all sessions, close callable from any thread). The watcher stop is asynchronous (batched in _loop_will_run), but socket.close() is synchronous — this guard bridges the gap. Reproducer: scylladb/scylla-dtest#7079

coderabbitai · 2026-06-11T15:08:54Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3bd2400c-2235-4161-98c0-c950a898d977

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vponomaryov · 2026-06-12T16:17:44Z

I ran the CI job here: https://argus.scylladb.com/tests/scylla-cluster-tests/59720349-680a-44ed-bd9f-109ddf28657f/results
But list of nemesis has changed and the picked ones are questionnable for triggering the connections closings.

So, I re-triggered it with different nemesis seed: scylla-staging/valerii/vp-longevity-100gb-4h-oci-test#38

vponomaryov mentioned this pull request Jun 12, 2026

fix(py-drv): use fixed python driver with proper reconnections - v3 scylladb/scylla-cluster-tests#15020

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard handle_read against closed socket race (#614)#902

fix: guard handle_read against closed socket race (#614)#902
fruch wants to merge 1 commit into
scylladb:masterfrom
fruch:fruch/fix-614-handle-read-guard

fruch commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Review skipped

Uh oh!

vponomaryov commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fruch commented Jun 11, 2026

Minimal fix for issue #614 — Bad file descriptor

Root Cause

The Fix

Why this is safe

Reproducer

Note on existing fix branches

How to test with SCT/Hydra

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Review skipped

Uh oh!

vponomaryov commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vponomaryov commented Jun 12, 2026 •

edited

Loading